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Abstract 

The error floor phenomenon observed with LDPC codes and their graph-based, iterative, message- 
passing (MP) decoders is commonly attributed to the existence of error-prone substructures - variously 
referred to as near codewords, trapping sets, absorbing sets, or pseudocodewords - in a Tanner graph 
representation of the code. Many approaches have been proposed to lower the error floor by designing 
new LDPC codes with fewer such substructures or by modifying the decoding algorithm. In this paper, 
we show that the source of the error floors observed in the literature could be imprecise implementation 
of the iterative MP decoding algorithms and the message quantization rules used. We then propose a 
new quantization method to overcome the limitations of conventional quantization rules. Performance 
simulation results for two LDPC codes commonly found to have high error floors when used with 
fixed-point iterative MP decoding algorithms provide an example of the practical application of our 
idealized theoretical results and the effectiveness of the proposed quantization method. 
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I. Introduction 

The outstanding performance of low-density parity-check (LDPC) codes and iterative, message- 
passing (MP) decoding algorithms 0], ED has attracted considerable attention over the past 
decade and these techniques are being deployed in a growing number of practical applications. 
At high signal-to-noise ratio (SNR), however, LDPC codes and MP decoders may be subject 
to the error floor phenomenon, which manifests itself as an abrupt change in the slope of the 
error-rate curve. Since many important applications, such as data storage and high-speed digital 
communication, often require extremely low error rates, the study of error floors in LDPC codes 
remains of considerable practical, as well as theoretical, interest. 

The error floor phenomenon is commonly attributed to the existence of certain error-prone 
substructures (EPSs) in a Tanner graph representation of the code. In the binary erasure channel 
(BEC), it has been shown that substructures known as stopping sets determine the error-rate per- 
formance and the observed error floor [3 |. However, for general memoryless binary-input output- 
symmetric (MBIOS) channels such as the binary symmetric channel (BSC) and the additive white 
gaussian noise channel (AWGNC), the EPSs that dominate the error floor performance have not 
yet been fully characterized, although some classes of EPSs have been identified and studied, 
such as near-codewords flU, trapping sets flU, absorbing sets , and pseudocodewords 0. 

One common way to improve the error floor performance of LDPC codes has been to redesign 
the codes to have Tanner graphs with large girth and without small EPSs |f8l- lfT0l . However, 
for LDPC codes that have been standardized, approaches are needed that do not modify the 
codes. In the literature, many modifications to the iterative MP decoding algorithms have been 
proposed in order to improve high SNR performance, such as averaged decoders [flTI . reordered 
decoders lfl2l . lfT3~l . and decoders with post processing [|T4l - [fT8l . In ifTTTl . the authors noticed 
that the emergence of errors in EPSs is heuristically related to a sudden magnitude change in 
the values of certain variable nodes (VNs). Hence, it was proposed to average the messages 
in a belief-propagation (BP) decoder over several iterations to avoid such sudden changes and 
therefore slow down the convergence rate for variable nodes in a trapping set and decrease the 
frequency of trapping set errors. Another heuristic approach is to process messages based on 
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the order of node reliabilities computed at each iteration [12J, and it was suggested that the 
scheduled decoders are able to resolve some standard trapping set errors [fl"3l . Although these 
general approaches are capable of improving the average error rate performance to some extent, 
the resulting decoders still fail on small EPSs and their effect on the error floor is not significant. 
To further improve the error floor behavior, decoders that make use of the prior knowledge of 
some small size EPSs are designed to reduce the decoding failures due to such EPSs. In [14J and 
lfl5l . the authors proposed a post-processing decoder that matches the configuration of unsatisfied 
check nodes (CNs) to trapping sets in a precomputed list after conventional MP decoding has 
failed. The size and completeness of the trapping set list directly affect the performance gain 
of such decoders, but to obtain a complete list of small trapping sets of a given LDPC code is 
generally quite computationally complex. A symbol-selecting post-processing technique was also 
developed in IfToTl . It saturates the channel messages on a set of selected variable nodes at each 
stage after the conventional MP algorithms fails. In [fT71 . Han and Ryan proposed a bi-mode 
erasure decoder that combines several problematic check nodes into a generalized constraint 
processor, to which a corresponding maximum a posteriori (MAP) algorithm, such as the BCJR 
algorithm, is then applied. Another post-processing approach that utilizes the graph-theoretic 
structure of absorbing sets, proposed in [TT81 , adjusts the appropriate messages in the iterative 
MP decoding once the decoder enters and remains in the absorbing set of interest. All the above 
approaches either change the message update rules of MP decoders or require extra processing 
steps after conventional MP decoding fails, both of which increase the decoding complexity 
relative to the original iterative MP algorithms. Moreover, the post-processing approaches that 
require prior knowledge of the set of EPSs causing the error floor are only effective when applied 
to LDPC codes whose EPSs have been carefully studied. 

In fixed-point implementation of iterative MP decoding, efforts have also been made to 
improve the error-rate performance in the waterfall region and/or error-floor region by optimizing 
parameters of uniform quantization [fl9l - [|22l . In lH9l , Zhao et al. studied the effect of message 
clipping and uniform quantization on the performance of the min-sum decoder in the waterfall 
region and heuristically optimized the number of quantization bits and the quantization step 
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size for selected LDPC codes. In Il20ll , a dual-mode adaptive uniform quantization scheme 
was proposed to better approximate the log-tanh function used in sum-product algorithm (SPA) 
decoding. Specifically, for magnitudes less than 1, all quantization bits were used to represent 
the fractional part; for magnitudes greater than or equal to 1, all bits were dedicated to the 
representation of the integer part. In [ETI . [|22l . Zhang et al. proposed a conceptually similar 
idea to increase precision in the quantization of the log-tanh function. Uniform quantization was 
applied to messages generated by both variable nodes and check nodes, but the quantization step 
sizes used in the two cases were separately optimized. We note, however, that none of these 
modified quantization schemes were primarily intended to significantly increase the saturation 
level, or range, of quantized messages, and in their reported simulation results, error floors can 
still be clearly observed. 

It was first noticed in Il23l that the high error floors associated with certain EPSs of some 
LDPC codes are closely related to the saturation level imposed on messages passed in the SPA 
decoder. In this work, we investigate the cause of error floors in binary LDPC codes from 
the perspective of the MP decoder implementation, with special attention to limitations that 
decrease the numerical accuracy of messages passed during decoding. We show that, under 
certain assumptions, the EPSs which are commonly associated with high error floors of some 
LDPC codes will not trap iterative MP decoders and cause high error floors if messages are 
accurately represented and there is no limitation on the number of iterations. Based upon an 
analysis of the growth rate of messages outsides an EPS in an idealized scenario, we propose 
a novel quasi-uniform quantization method that captures the essence of messages in different 
ranges of reliability. The proposed quantization method has an extremely large saturation level 
which prevents iterative MP decoders from being trapped by an EPS. This property, to the best of 
our knowledge, distinguishes if from other quantization techniques for iterative MP decoding that 
have appeared in the literature. With the new quantization method, it is possible to have a fixed 
point implementation of iterative MP decoders that achieves low error floors without an additional 
post-processing stage or a modification of either the decoding update rules or the graphical code 
representation upon which the iterative MP decoder operates. We present simulation results for 
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min-sum decoding, SPA decoding, and some of their variants, that demonstrate a significant 
reduction in the error floors of two representative LDPC codes, with no increase in decoding 
complexity. 

The remainder of the paper is organized as follows. Section O gives some notation and 
definitions used throughout the paper. In Section Unl we analytically investigate the impact that 
message quantization can have on MP decoder performance and the error floor phenomenon. In 
Section [TV] we propose an enhanced quantization method intended to overcome the limitations 
of traditional quantization rules. In Section [V] we incorporate the new quantizer into SPA and 
min-sum decoding and, through computer simulation of several LDPC codes known for their 
high error floors, demonstrate the significant improvement in error-rate performance that this 
new quantization approach can afford. Section |VT] concludes the paper. 

II. Notation and Definitions 

The study of the phenomenon of error floors began shortly after LDPC codes were rediscovered 
about a decade ago. It has been shown that the EPSs known as stopping sets cause the error 
floor in the binary erasure channel (BEC), and such EPSs have a clear combinatorial description. 
Enumeration of these structures makes it possible to accurately estimate the error floor []3]. 
However, for other MBIOS channels such as the binary symmetric channel (BSC) and the 
additive white gaussian noise channel (AWGNC), it is more difficult to establish the relationship 
between EPSs and error floors. In @), it was first pointed out that the near-codewords caused 
error floors in simulations of Margulis and Ramanujan-Margulis LDPC codes on the AWGNC. 
The term trapping set proposed by Richardson [0 is operationally defined as a subset of variable 
nodes (VNs) that is susceptible to errors under a certain iterative MP decoder over an MBIOS 
channel. Hence, this concept depends on both the channel and the decoding algorithm. In BH, 
the error floor is associated with some combinatorial substructures within the Tanner graph, 
named absorbing sets, which are defined independently of the channel. The absorbing sets 
correspond to a particular type of near codewords or trapping sets that are stable under bit- 
flipping operations. All these EPSs have been believed to be the cause of error floors, and for 
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some LDPC codes, techniques such as importance sampling used to estimate the error floor are 
based on the probability of decoding failures on such EPSs [0, ll24l . In this section, we will 
show that under certain assumptions about the correctness of variable nodes outside a given 
EPS in the Tanner graph, conventional iterative decoders that accurately represent messages will 
eventually correct errors supported by the EPS. 

To facilitate our discussion, we define a substructure called an absolute trapping set from 
a purely graph-theoretic perspective, independent of the channel and the decoder. Let G = 
(V U C, E) denote the Tanner graph of a binary LDPC code with VNs V = {t>i, . . . , v n }, CNs 
C = {ci, . . . , c m }, and edge set E. 

Definition 1: A stopping set of size a is a configuration of a variable nodes such that the 
induced subgraph has no check nodes of degree-one. An (a, b) trapping set is a configuration 
of a variable nodes, for which the induced subgraph is connected and has b odd-degree check 
nodes. If the induced subgraph of an (a, b) trapping set does not contains a stopping set and has 
at least one check node of degree-one, it is called absolute trapping set. 

In the literature, all trapping sets of interest that cause the error floor of an LDPC code are 
of size smaller than the minimum stopping set size of the code, since otherwise the error-floor 
would be dominated by the stopping sets (3). By requiring at least one check node of degree 
one, we exclude stopping sets from our definition of absolute trapping set. As we will discuss 
later in this section, these degree-one check nodes are essential because they are able to pass 
correct extrinsic messages into the trapping set. To the best of our knowledge, almost all trapping 
sets of interest in the literature are absolute trapping sets. For example, both of the well-known 
(5,3) trapping sets in the Tanner code of length 155, the notorious (12,4) trapping sets in the 
(2640,1320) Margulis code, and the (5,5) trapping set in some codes of variable-degree five are 
all elementary trapping sets, as shown in Fig. \T\ where check nodes of degree-one are shaded. 
Unless otherwise indicated, all trapping sets referred to in this paper are elementary trapping 
sets as well. 

In analogy to the definition of computation tree in 11251 , we define a k-iteration computation 
tree as follows. 
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(a) a = 5, b = 3 (b) a = 12, b = 4 (c) a = 5, 6 = 5 

Fig. 1. Examples of (a,b) absolute trapping sets. Check nodes in set Ci and variable nodes in set Vi are shown shaded. 

Definition 2 (k-iteration computation tree): A k-iteration computation tree Tf.{y) for an itera- 
tive decoder in the Tanner graph G is a tree graph constructed by choosing variable node v G V 
as its root and then recursively adding edges and leaf nodes to the tree that participate in the 
iterative message-passing decoding during k iterations. To each vertex that is created in T k (v), 
we associate the corresponding node update function in G. 

Let S be the induced subgraph of an (a, b) trapping set contained in G, with VN set Vg C V 
and CN set Cs Q C. Let set Ci C C s be the set of degree-one CNs in the subgraph S, and 
let set V x C V s be the set of neighboring VNs of CNs in C x . In the (5,3), (12,4), and (5,5) 
trapping sets shown in Fig. [Q the check nodes in set Ci and the variable nodes in set V\ are 
shaded. We refer to a message on an edge adjacent to VN v as a correct message if its sign 
reflects the correct value of v, and as an incorrect message, otherwise. Let D(u) be the set of 
all descendants of the vertex u in a given computation tree. 

Definition 3: Given a Tanner graph G and an induced subgraph S of a trapping set, a variable 
node v E V\ is said to be k-separated if, for at least one of its neighboring degree-one check 
node c G C\ in S, no variable node v' G Vs belongs to D(c) C Tk{v). If every y G VI is 
A;-separated, the induced subgraph S is said to satisfy the k-separation assumption. 

In Fig. |2(a)[ we show the graph of a (4, 4) trapping set and some of its neighboring nodes. 
The set of VNs in the trapping set is Vs = {i>i, v 2 , v s , V4}, represented as solid black cycles. 
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(a) (4, 4) trapping set and part of its neigh- (b) Computation tree with root vi 

boring nodes 

Fig. 2. Example of a (4, 4) trapping set and its corresponding computation tree. 

The set of CNs in the trapping set is Cg = {q}, 1 < 2 < 8. In this trapping set, every VN has a 
neighboring degree-one CN, i.e., V\ = Vs, and G\ = {c\, c 2 , c 3 , c 4 }. For example, the 3-iteration 
computation tree of VN V\ is shown in Fig. |2(b)[ It can be verified from this computation tree 
that v\ is 2-separated but not 3-separated, because v-i G Vs is a descendant of ci in T 3 (vi), but 
not in T 2 (t>i). It is worth noting that whether or not a trapping set satisfies the /^-separation 
assumption depends on the Tanner graph outside the trapping set, not the trapping set itself. 

We want to point out that the ^-separation assumption is much weaker than the isolation 
assumption in [|26l . The separation assumption here only applies to the VNs that have neighboring 
degree-one CNs in the induced subgraph S, and these neighboring degree-one CNs do not have 
any VNs from the trapping set as their descendants in the corresponding A;-iteration computation 
tree. With the separation assumption, the descendants of c G C 1 are separated from all the nodes 
in the trapping set, meaning that the incorrect messages passed in the trapping set do not affect 
the extrinsic messages sent towards c in the computation tree. 
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III. Error Floors of LDPC Codes 



A. Trapping Sets and Min-Sum Decoding 

To get further insight into the connection between trapping sets and decoding failures of 
iterative MP decoders, we first consider a simple iterative MP decoder, the min-sum (MS) 
decoder, which can be viewed as a simple approximation of the sum-product algorithm. We now 
briefly recall the VN and CN update rules of min-sum decoding. 

A VN Vi receives an input message Lf 1 from the channel, typically the log-likelihood ratio 
(LLR) of the corresponding channel output, defined as follows 



where c- t £ {0, 1} is the code bit and is the corresponding received symbol. 

For the BSC, the inputs and outputs of the channel are binary, and the input LLR to the 
decoder is 



where p is the channel error probability. From (0, we can see that, for a fixed p, the LLRs of 
channel outputs have the same magnitude. 

For the AWGNC, assume the transmitted symbols Ui £ {—1,1} are binary antipodal. The 
received symbol is = Ui + n i5 where is independent and identically-distributed (i.i.d.) 
Gaussian noise of zero-mean and variance a 2 . Hence, we have 



Denote by L^j and the messages sent from v { to cj and from cj to respectively, 
and denote by N(k) the set of neighboring nodes of VN Vk (or CN c&). Then, the message sent 
from Vi to Cj in min-sum decoding is given by 




(1) 




(2) 




(3) 




(4) 



j'<EN(i)\j 
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n si § n ( L 

i'eN(j)\i 



min |Lj'_>. 7 -| 

i'£N(j)\i 



(5) 



In the initialization step, we set L^j = Lf l . It can been seen from © and © that the min- 
sum decoding algorithm is insensitive to linear scaling, meaning that linearly scaling all input 
messages from the channel would not affect the decoding performance. 

Theorem 1: Let G be the Tanner graph of a variable-regular LDPC code that contains a 
subgraph S induced by a trapping set. When S satisfies the ^-separation assumption and when the 
messages from the BSC to all VNs outside S are correct, the min-sum decoder can successfully 
correct all erroneous VNs in S, provided k is large enough. 

Proof: Assume VN v r 6 V\ C S is /c-separated and the corresponding fc-iteration com- 
putation tree is Tk(v r ). Let c r G G\ be the neighboring degree-one CN of v r in S. From the 
separation assumption and the assumed correctness of channel messages for VNs outside S, all 
descendants of c r in Tk(v r ) receive correct initial messages from the BSC. Like the LLRs of the 
BSC outputs, all the initial messages in the decoder, Lf 1 , 1 < i < n, have the same magnitude. 
Denote the subtree starting with CN c r as T(c r ). With the VN/CN update rules of the min-sum 
decoder, we analyze the messages sent from the descendants of c r in T(c r ). First, according to 
the CN update rule described in ©, all messages received by a VN from its children CNs in 
T(c r ) must have the same sign as the message received from the channel by this VN, because 
all the messages passed in T(c r ) are correct. Therefore, the outgoing message from any VN Vi 
to its parent CN Cj in T(c r ) satisfies the following equality 



j'£N(l)\j 



(6) 



Lf + E 



I-^j'-hI • 



j'EN(i)\j 

Moreover, since the LDPC code considered is variable-regular and all the channel messages 
from the BSC have the same magnitude, it can be shown that, for the min-sum decoder, all 
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incoming messages received by a VN from its children CNs in T(c r ) must have the same 
magnitude as well. Therefore, the messages sent from VNs of the same level in the computation 
tree T(c r ) have the same magnitude. Let \L t \ be the magnitude of the messages sent by the 
VNs whose shortest path to a leaf VN contains / CNs in T(c r ). Hence, \L \ is the magnitude 
of messages sent by leaf VNs, as well as the magnitude of channel inputs. Then, we have 

\U\ =|L | + (d„-l)|Li_i| 

>(4-l)|L,_ 1 | (7) 
> (d v -l) l \L \ 

where d v is the variable node degree. Hence, it can be seen that the magnitudes of messages 
sent towards the root CN c r of the computation tree T(c r ) grow exponentially, with d v — 1 as 
the base, in every upper VN level. Therefore, for I < k, the magnitude of the message sent from 
c r to its parent node v r , the fc-separated root VN of T k (v r ), in the /-th iteration is greater than 

(d„ - 1)'|L |. 

Now, let us consider a branch of the computation tree that starts from another child CN 
d G Cs\ C\ of the root v r in T k (v r ), denoted by by T(c'). Suppose the message received by 
v r from d after I iterations, denoted by L\, has a different sign than the message received from 
c r G C\\ otherwise, v r would already be corrected. Since the induced subgraph of the trapping 
set is connected, there exists an integer t such that any t-level subtree starting from a VN v G S 
in T(c'), i.e., a subtree with t levels of VNs, must have at least one CN from the set C\ as 
its descendant. Since the number of VNs in the trapping set is a, any two VNs in the induced 
subgraph are connected by a path of length less than 2a. Hence, it is obvious that t < 2a, and 
a tighter upper bound is t < lQ g^ • Of course, the exact value of t depends on the structure 
of the trapping set. For min-sum decoding, every CN in a computation tree takes the minimum 
magnitude of messages received from its children VNs. Therefore, each t-level subtree can be 
considered as a "super-node" with (d v — 1)* children VNs, and at least one of these children 
VNs has descendants that all receive correct messages from the channel; this means that at least 
one of the incorrect messages going into the super-node would be canceled out by one or more 
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correct messages. So if the output message, L out , of such a super-node is incorrect, its magnitude 
satisfies 

\L out \ < ((4 - 1)* - l)\L m \ + \L ch \ } (8) 
where \L in \ is the largest magnitude of all incoming incorrect messages, and the second term 

i-l 

\L ch \ = \L \ J2(d v — l) 1 is an upper bound on the sum of all channel input LLRs to the VNs in 

j=0 

the t-level subtree. Note that the leaf VNs of such t-level subtrees are not necessarily the leaf 
VNs of Ifc(tv). Thus, we can upper bound the magnitude of the incorrect message sent from c' 
to v r after I iterations by 

\L[\ <\L f >\-[(d v -l) t -l\ m +\L ch \ £ \(d v — 1)* — if 

i=0 (9) 

< \L ch \ ■ \L \ ■ [(d v - iy - l] m 

where \x] is the smallest integer greater than x. 

Therefore, by taking the logarithm of \L{\ in © and \L[\ in ®, respectively, we have 



log|Lj| > log|L | +llog(d v - 1) 

= \og\L \+l-\-log(d v -iy, 



(10) 



and 



log \L[\ < log |L ch | + log |L | + \l/t\ log [(4 - 1)* - 1] 

< log|L c/l |+log|L |+log[(4-l)*-l] ( n ) 

+ri-iog[(^-i)*-i]. 

Note that the first term in (fTOl and the first three terms in (ITU are constants and independent 
of the number of iterations /. 

Since \og(d v — 1)* > log [(d v — 1)* — 1], if I is large enough and there is no limitation imposed 
on the magnitude of messages, it is easy to see from (fTDl and (TTTT) that \L\\ would be greater than 
\L[\ multiplied by any constant. This means that the correct messages coming from outside of 
the trapping set to VNs in V\ through their neighboring CNs in C\ will eventually have greater 
magnitude than the sum of incorrect messages from other neighboring CNs, i.e., \L{\ > (d v — 
l)\Li | . By letting the number of iterations further grow, the correct messages would eventually 
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be large enough to correct all erroneous VNs in the trapping set. ■ 
Remark 1: Note that the upper bound in © is extremely loose, and for most small-size 
trapping sets, the upper bound is generally less than \L \(d v — 2)'. 

Corollary 2: Let G be the Tanner graph of a variable-regular LDPC code that contains a 
subgraph S induced by a trapping set. When S satisfies the /^-separation assumption and the 
channel messages from the AWGNC to all VNs outside S are correct, the min-sum decoder can 
successfully correct all erroneous VNs in S, provided k is large enough. 

Proof: The input LLRs from the AWGNC to the decoder could have different magnitudes, 
so we define |L min | and |L max | to be the minimum and maximum magnitude of all LLRs, 
respectively. Then, the bounds on log \Li\ in (flOl) and on log \ L\\ in (fTTT) can be extended to the 
AWGNC setting as 

log \Li\ > log |L min | + I ■ j ■ \og(d v - 1)\ 

and 

log mi < log \L ch \ + log |L max | + log [(d v - iy - 1] 

+rf-iog[(^-i) t -i]. 

Since log |L min | and log |L max | are both constant terms and do not change as I increases, similar 
to the BSC case, we can also conclude that the correct messages from outside the trapping set 
will eventually have greater magnitude than the incorrect messages within the trapping set. ■ 
In general, the error-rate performance of min-sum decoding is not as good as the more 
complicated SPA decoding. However, there are several quite simple but effective ways to adjust 
the CN update rule of min-sum decoding to get comparable performance to SPA decoding. One 
method is attenuated-min-sum (AMS) decoding G71 . where the magnitudes of messages are 
attenuated at CNs. The corresponding CN update rule of AMS is as follows 

• a ■ min iLj/^J, (12) 

i'eN(j)\i 

where < a < 1 is the attenuation factor, which can be a fixed constant or adaptively adjusted. 
Another way to improve the error-rate performance of min-sum decoding is offset-min-sum 
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(OMS) decoding, which applies an offset to reduce the magnitudes of CN output messages. The 
resulting CN update equation is 



If sign(Lj/_>j) 

i'eN(j)\i 



max{ min |Li'_>.,-| — 8, 0j, (13) 



where /3 > is the offset which, like the attenuation factor, can be a fixed constant or adaptively 
adjusted. In some implementations, for additional simplicity, the attenuation factor or offset is 
set to be the same fixed constant for all CNs and all iterations ll2~7ll . 

Theorem \T\ and its corollary can be extended to both AMS and OMS decoding. For AMS, 
if the attenuation factor is a constant, the proof of Theorem Q] can be directly applied. As for 
OMS, the proof follows the proof of Theorem |5] in the next subsection. 

B. Trapping Sets and Sum-Product Algorithm Decoding 

In this subsection, we further extend Theorem [T| to sum-product algorithm decoding. The 
optimality criterion in the design of the SPA decoder is symbol-wise maximum a posteriori 
probability (MAP), and it is an optimal symbol-wise decoder on Tanner graphs without cycles. 

In SPA decoding, VN nodes take log-likelihood ratios of received information from the channel 
as initial input messages. The VN update rule is the same as that of min- sum decoding described 
in ©, which involves the summation of all incoming extrinsic messages. In the CN update rule 
of SPA decoding, the message sent from CN j to VN i is computed as 

= 2tanrT 1 ] [ tanh j . (14) 

\i'eN(j)\i J 

In practical implementations of the SPA, the following equivalent CN update rule is often 
used 



i'£N(j)\i 



1 ( £ 0(1^1) ) (15) 



where <fr(x) = — log[tanh(x/2)] and <p 1 { X ) — as shown in Fig. [3] In some fixed-point 

implementations, in order to have better approximation, different look-up tables could be used 
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to compute <f>(x) and </> _1 (x) [|22l|. 

We want to point out that the hyperbolic tangent function, tanh(x), has numerical saturation 
problems when computed with finite precision. For example, in double-precision floating-point 
computer implementation (64-bit IEEE 754) ||28~1 . it can be shown that tanh(x/2) would be 
rounded to 1 when x > 38, meaning that cf)' 1 (<j)(x)) = oo for x > 38 ||29l . In order to avoid 
such problems that can arise from limited precision, thresholds on the magnitudes of messages 
must be applied in simulation studies Il22ll . 

In order to maintain the performance advantage of SPA decoding over min-sum decoding, 
the quantization method has to preserve the self-inverse property of the <f)(x) function and 
to accurately compute the CN update function in (|T3T >. However, from Fig. [3] we can see 
that it is difficult to have a good approximation of the (j)(x) function with limited resolution, 
because this requires both fine precision and large range. Efforts have been made to design 
quantization methods that work effectively with the <f>(x) function. For example, a variable- 
precision quantization scheme proposed in EOl uses larger quantization step size for magnitudes 
greater than 1, and smaller step size for magnitudes less than 1. An adaptive uniform quantization 
method proposed in [ETl uses different quantization step sizes for the outputs of the <p(x) and the 
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4>~ l {x) function in (fl"5l) . Fig. [3] clearly shows that, if the output of the <f>(x) function is quantized 
with finite precision e, inputs greater than _1 (e) can not be distinguished, and _1 (e) is quite 
small even for extremely fine precision, e.g., _1 (1O -6 ) ~ 14.5. Hence, the largest supported 
magnitude during decoding depends on the finest precision of quantization. This means increasing 
the quantization range without improving the precision is not beneficial. 

In order to avoid dealing with the <p{x) function, a variety of other CN update rules, most 
of which are approximations to the SPA, have been proposed. Some of these approximation are 
based on the following equivalent version of the SPA CN update rule represented by (fT4l or 
(ED), 

Lj-ti = EB Li'-^j (16) 

i'eN(j)\i 

where EB is the pairwise "box-plus" operator defined as 

/ 1 _i_ e u + v \ 

= sign(U)sign(V)-{mm(\U\,\V\) + s(\U\,\V\)} (17) 
= sign([/)sign(y)min(|[/|, + (18) 

and 

s(x, y) = log (1 + e - '* 4 *') - log (l + e" 1 ^ 1 ) . (19) 

The proof of equivalence between ([141 and ([TBI can be found in 1501 . We call such an imple- 
mentation box-plus SPA decoding. The formulation above does not have the precision problem 
that CHJ) and (fl"5l) have, and, in fact, in 64-bit double-precision floating-point implementation, the 
maximum magnitude of a message that can be supported is approximately 1.8 x 10 308 , which is 
the largest double-precision value supported by the IEEE 754 standard. Unlike the <f>(x) function, 
the function log (l + e - '^), as shown in Fig. |4l can be well quantized or approximated with 
piecewise linear functions ll29~l - ll3Tl . Moreover, if the term s(x,y) is omitted, the box-plus SPA 
becomes the min-sum algorithm, and if we replace this term with a constant, we get the offset 
min-sum algorithm. As we will show later, the magnitude of the function s(x, y) is upper bounded 
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Fig. 4. A plot of function log ^1 + e~ |:c| ^ . 

by a constant. 

From the CN update rule described in (fTol and (fl"8T ). it can be seen that box-plus SPA decoding 
can be considered as min-sum decoding with a small correction factor. The following lemma 
relating CN messages in min-sum and SPA decoding was first shown in ll32l for SPA decoding 
with the CN update equation described in (IT~4T > . Although we know that the box-plus SPA and 
the SPA using the <p(x) function are equivalent, we include here, for the sake of completeness, 
a proof of the lemma for box-plus SPA, as follows. 

Lemma 3: Given the same input messages, the CN output in box-plus SPA decoding has the 
same sign as, but smaller magnitude, than that of min-sum decoding. 

Proof: Since s(x,y) = when xy = 0, it can be seen from (fTTT ) that the lemma is true 
if the inequality mm(x,y) + s(x,y) > holds for any positive real values x and y. Assuming 
x > y > 0, the following inequalities are equivalent 

min(x, y) + s(x, y) > 

* log^ + logi±f^S> 

e y + e~ x - 1 - e- x+y > 

<£> (e y -l)(l-e- x ) > 0. 

Since e y > 1 and e~ x < 1, the final inequality holds. This proves the lemma. ■ 
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From Lemma |3l we can see that SPA decoding can be thought of as min-sum decoding with a 
small correction factor that does not change the sign of the CN output of the min-sum algorithm. 

Lemma 4: For any real values x and y, the magnitude of s(x, y) is bounded by a constant. 
Specifically, — log 2 < s(x,y) < log 2. 

Proof: If x and y have the same sign (i.e., xy > 0), we have \x + y\ > \x — y\ and hence 
s(x,y) < 0. Without loss of generality, if we assume x > y > 0, then 

i + e- x - y 



s(x,y) = log 
= log 



1 + e~ x+ y 

e x + e~ y 

e x _|_ e y 

e x + e~ x 



> lo 

" e x + e x 

> togi 

Therefore, when xy > 0, we have — log 2 < s(x, y) < 0. When xy < 0, it can also be similarly 
shown that < s(x, y) < log 2, and for the case xy = 0, we have s(x, y) = 0. ■ 

As we discussed earlier, no matter how one designs the fixed-point implementation of the 
original SPA using the 4>(x) function, or even with the floating-point implementation, the function 
\x — (fi^ 1 (4>(x))\ is unbounded. Even if we saturate both the input and the output of the cf)(x) 
function, the value of \x — (0(x))| is still unbounded and linear in x. Therefore, the CN 
output of a practical implementation of (fl4l) or (fl3T ) can significantly differ from the true 
computed value. However, since box-plus SPA decoding can be considered as min-sun decoding 
with a correction factor, the implementation error mainly comes from the computation and 
quantization of the correction factor, which is a small bounded value, as shown in Lemma HI 

With Lemma [3] and Lemma HI we can now extend Theorem Q] to SPA decoding. 

Theorem 5: Let G be the Tanner graph of a variable-regular LDPC code that contains a 
subgraph S induced by a trapping set. When S satisfies the /c-separation assumption and all 
VNs outside S receive the correct transmitted symbols from the BSC, with proper scaling of all 
initial LLRs, the SPA decoder can successfully correct all of the VNs in S that receive incorrect 
symbols from the BSC, provided k is large enough. 
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Proof: From Lemma |3j we know that SPA decoding can be considered as min-sum decoding 
with a small correction factor which does not change the sign of the original min-sum output. 
Moreover, the magnitude of the CN output of SPA decoding is always less than or equal to that 
of min-sum decoding. To compute the output for a CN of degree d c , the box-plus SPA uses the 
pairwise box-plus operation (fT8T ) at most log(c? c — 1) times. Hence, the difference between output 
messages of the SPA and the min-sum algorithm is upper bounded by s = |~log(<i c — 1)] • log 2, 
where \x] is the smallest integer that is greater than x. 

By applying an approach similar to that used in the proof of Theorem [T] we can lower bound 
the magnitude of messages Li in SPA decoding as follows 

\Li\ > \L \ + (d v - 1) ({Li^l - s) 

> (4-l)(|L z _a| -s) 

i 

> (d v -l) l \L \-sJ2(d v -iy 



i=l 

1(1 — I * ' 

(d v - 1) \L \ - s(d v 



/,, , , , n (dv -!)'-! 

) (d v - 1) - 1 

df ) 1 \ du 1 



d v -2 



Since all input messages to the decoder from the BSC have the same magnitude, if we scale 
the magnitudes of all initial messages such that 

\L \ > jj^i* = ^ • Rog(4 - l)] • log2, (20) 

then the magnitudes of messages sent towards c r in the computation tree Tk(v r ) grow expo- 
nentially in the number of iterations, with base d v — 1. Hence, using the same reasoning as in 
the proof of Theorem [H it can be shown that, if k is large enough and there is no limit on the 
magnitudes of messages, the correct messages outside the trapping set eventually overcome the 
incorrect messages passed within the trapping set, thereby correcting all erroneous VNs in the 
trapping set. ■ 
Remark 2: As will be shown in the simulation results, linear scaling of the input LLRs to the 
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SPA decoder will indeed affect the decoding performance, because the correction factor s(x,y) 
is not linear in either x or y. 

Remark 3: If an offset min-sum decoder has a constant or bounded offset value, the proof of 
Theorem |5] can be directly applied to obtain a similar result. 

The proof of Theorem |5] can be adapted to prove an analogous result for the AWGNC, stated 
in the following corollary. 

Corollary 6: Let G be the Tanner graph of a variable-regular LDPC code that contains a 
subgraph S induced by a trapping set. When S satisfies the ^-separation assumption and the 
messages from the AWGNC to all VNs outside S are correct, with proper scaling of all initial 
LLRs, the SPA decoder can successfully correct all erroneous VNs in S, provided k is large 
enough. 

Proof: In analogy to the proof of Lemma |2l let |L | be the minimum magnitude of all input 
LLRs to the decoder from the AWGNC, and linearly scale magnitudes of all input messages such 
that (|20l ) is satisfied. Then, using reasoning along the lines of the proof of Theorem [5] we can 
show that the magnitudes of correct messages outside the trapping set still grow exponentially 
with d v — 1 as the base, and eventually they correct all erroneous VNs in the trapping set. ■ 
In [|23l . by applying a linear system model and density evolution, the authors obtained some 
statistical lower bounds on the exponential growth rate of correct messages in SPA decoding on 
the AWGNC. 

For most LDPC codes, the trapping sets typically satisfy the /c-separation assumption only 
for small values of k. Nevertheless, as described more fully in Section [V] in our 64-bit double- 
precision floating-point computer simulations of min-sum decoding and box-plus SPA decoding 
applied to several LDPC codes traditionally associated with high error floors, we have not 
observed, in tens of billions of channel realizations of both the BSC and the AWGNC, any 
decoding failure in which the error patterns correspond to the support of a small trapping set. 
Moreover, when we force every VN in a trapping set to be in error and all other VNs to be 
correct, the floating-point decoders can successfully decode, whereas a decoder implementation 
that limits the magnitude of messages can not resolve the errors in the trapping set and fails to 
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IV. New Quantized Decoders with Low Error Floors 

In hardware implementations of iterative MP decoding algorithms, a modest number of bits 
are used to represent messages, precluding high-resolution representation of messages over a 
large numerical range. As reported in the literature, most hardware implementations and their 
computer-based simulations use some form of uniform quantization. We will consider uniform 
quantizers with quantization step A and g-bit representation of quantization levels, with one of 
the q bits denoting the sign. The quantized values are I A for —N < I < N, where N = 2 q ~ 1 — 1. 

It has been noticed that error floors can be lowered when more bits are used to represent 
messages, either by reducing the step size or by increasing the saturation level ll22l . However, 
as explained earlier in Section IIII-B L the maximum magnitude of supported input to the 4>(x) 
function depends on the quantization resolution at the output. Therefore, uniform quantization can 
not support large values as inputs of the <p(x) function due to the limited output resolution. Hence, 
substantially increasing the saturation level was not considered as a feasible mechanism for 
reducing error floors. Even for min-sum decoding, which does not have such a numerical problem 
caused by the <p(x) function, there appears to be no quantization method in the literature that 
reduces error floors specifically by means of a deliberate, significant extension of the quantization 
range, although it has been noticed that the saturation level could affect error floor performance of 
min-sum decoding |[T9ll . Although the authors of [f!9l claimed that four-bit uniform quantization 
suffices to obtain close to the performance of floating-point min-sum decoder, which is considered 
as the ideal min-sum decoder, for most codes over a wide range of SNR, our simulation results, 
shown in the next section, demonstrate that uniform-quantized min-sum decoding and its variants 
still have high error floors even with eight quantization bits. In the remaining part of this section, 
we propose a novel quantization method that substantially extends the quantization range while 
maintaining precision in the representation of values of small magnitude. Moreover, we will 
show that, even with a small number of quantization bits, the new method provides significant 
improvement even over the floating-point implementations. 
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As shown in the proof of Theorems Q] and [5] and their corollaries, when a trapping set satisfies 
the /c-separation assumption for a large value of k, the magnitudes of correct messages outside 
the trapping set grow exponentially in the number of iterations. Therefore, it would be desirable 
for the message quantizer to capture, at least to some extent, the exponential increase of these 
message magnitudes while retaining precision in the representation of messages with smaller 
magnitudes. To this end, we propose a new (q + l)-bit quasi-uniform quantization method that 
adds an additional bit to g-bit uniform quantization to indicate a change of step size in the 
representation of large message magnitudes. Hence, the messages after quantization will belong 
to an alphabet of size 2 q+1 — 1. Specifically, the (q + l)-bit quasi-uniform quantization rule is 
given by 

(1,0), if/A-f <L</A + f 
(N,0), if NA — ^ < L < dNA 
(-N, 0), if -dNA <L< -NA + § 
Q{L) = { ( r , 1), if d r NA <L< d r+l NA (21) 
(-r, 1), if -d r+1 NA <L< -d r NA 
(N + 1,1), if L > d N+1 NA 
(-N -1,1), if L < -d N+1 NA 

where N = 2 q ~ 1 — 1, — N + 1 < I < N — 1, l<r<iV, and d is a quantization parameter 
within the range (l,d v — 1]. Generally, the values represented by the (q + l)-bit quasi-uniform 
quantization messages (I, 0) are I A, and the values of messages (±r, 1) are ±d r NA, respectively. 
For messages within the range of [—NA,NA], the new quasi-uniform quantizer provides the 
same precision as a g-bit uniform quantizer with quantization step A. For messages outside 
that range, non-uniform quantization with exponentially increasing step sizes of the form d r NA 
is used to allow reliable messages to be more accurately represented Hence, the extra bit in 
(q + l)-bit quantization can be viewed as an indicator bit, which indicates whether the quantized 
message is in the uniform quantization range or in the non-uniform quantization range. Note that, 
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TABLE I 

(3 + l)-BIT QUASI-UNIFORM QUANTIZATION WITH A = 1 AND d = 3. 



message 


range 


value 


message 


range 


value 


0000 


(-0.5,0.5] 





1110 


(-0.5,0.5] 





0010 


(0.5,1.5] 


1 


1100 


(-1.5,-0.5] 


-1 


0100 


(1.5,2.5] 


2 


1010 


(-2.5,-1.5] 


-2 


0110 


(2.5,9) 


3 


1000 


(-9,-2.5] 


-3 


0001 


[9,27) 


9 


1111 


(-27,-9] 


-9 


0011 


[27,81) 


27 


1101 


(-81,-27] 


-27 


0101 


[81,243) 


81 


1011 


(-243,-81] 


-81 


0111 


[243,oo) 


243 


1001 


(-oo,-243] 


-243 



in the non-uniform quantization range, the value with smallest magnitude is used to represent 
quantized values in each interval. 

Table U shows an example of (3+l)-bit quasi-uniform quantization with A = 1 and d = 3. 
The first bit is the sign bit, and the last bit indicates whether the uniform or exponential step 
size is used. The uniform quantization range in this example is from —3 to 3 with uniform step 
size 1, and the exponential quantization range is above 3 or below —3 with nonuniform step 
size 3 ■ (3 r — 3 r_1 ) for 1 < r < 4. For example, all values within the non-uniform quantization 
interval [27, 81) would be quantized to 27. The decimal values are used in the VN and CN 
update computations, and then the corresponding quantized binary messages are passed between 
VNs and CNs. 

In comparison to the modified and optimized quantization methods proposed in f|T9l - [|2TI . 
the (q + 1 ) -bit quasi-uniform quantizer can represent values of much greater magnitudes. Since 
the range of uniformly quantized messages in MP decoders is small in practice, the correct 
messages outside a trapping set could reach the saturation level within a few iterations. As a 
result, even though correct, these messages may not be large enough to offset the contribution 
of incorrect incoming messages for problematic VNs. Hence, even after optimization of the step 
and range of a uniform quantizer, the decoder may not produce the same error floor performance 
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as a floating-point MP decoder. In contrast, the saturation levels of the proposed (g+l)-bit quasi- 
uniform quantizer are greatly extended, allowing the correct messages outside a trapping set to 
grow large enough to overcome all incorrect messages reaching the problematic VNs from other 
VNs within the trapping set. It should be noted, however, that the uniform quantization step 
included in our quasi-uniform quantization affects the error-rate performance in the waterfall 
region, where the magnitudes of messages are generally small and the precision of the messages 
is important. 

We can further extend the idea of (g+l)-bit quasi-uniform quantization to a more general 
form. The (g+l)-bit quasi-uniform quantization has n = q + 1 bits in total to represent N = 2 n ~ 1 
different levels of magnitudes, and as described in (l2TT ). N/2 levels are allocated to the uniform 
quantization range and the other N/2 levels have exponential step sizes. The general symmetric 
n-bit quasi-uniform quantization can also represent N = 2 n ~ 1 different magnitudes, but it can 
have any number, say N u , of levels in the uniform quantization range and the remaining N — N u 
levels in the exponential quantization range. With a quantization rule similar to (|2TT >. the quantized 
values of the general n-bit quasi-uniform quantization are ZA for — N u < I < N u , d l ~ Nu+1 N u A 
for I > N u , and — d l ~ Nu+1 N u A for I < —N u . Unlike the (g+l)-bit quasi-uniform quantization, 
the general n-bit quasi-uniform quantization does not have a specific indicator bit, and therefore, 
it is more flexible in the sense that it can allocate more levels to the uniform range or to the 
exponential rage to have best performance. Table HH shows an example of the general 4-bit quasi- 
uniform quantization with N u — 5, A = 1, and d = 3. The uniform quantization range in this 
example is from —4 to 4 with uniform step size 1, and the exponential range is above 4 or below 
—4 with exponential step size 4 • (3 r — 3 r_1 ) for 1 < r < 3. 

Although the motivation for the proposed quasi-uniform quantization method came from an 
analysis of message-passing decoder behavior on trapping sets that satisfy the /c-separation 
assumption for large k, a property not satisfied by trapping sets in practical LDPC codes, the 
simulation results in the next section demonstrate a significant reduction in the error floors of 
two representative LDPC codes, with no increase in the decoding complexity. 
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TABLE II 

4-BIT QUASI-UNIFORM QUANTIZATION WITH A = 1 AND d = 3. 



message 


range 


value 


message 


range 


value 


0000 


(-0.5,0.5] 





1111 


(-0.5,0.5] 





0001 


(0.5,1.5] 


1 


1110 


(-1.5,-0.5] 


-1 


0010 


(1.5,2.5] 


2 


1101 


(-2.5,-1.5] 


-2 


0011 


(2.5,3.5] 


3 


1100 


(-3.5,-2.5] 


-3 


0100 


(3.5,12) 


4 


1011 


(-12,-3.5] 


-4 


0101 


[12,36) 


12 


1010 


(-36,-12] 


-12 


0110 


[36,108) 


36 


1001 


(-108,-36] 


-36 


0111 


[108,oo) 


108 


1000 


(-oo,-108] 


-108 



V. Numerical Results 

To demonstrate the improved performance offered by our proposed quasi-uniform quantization 
method, we compare its error-rate performance to that of uniform quantization with several types 
of MP decoders applied to two known LDPC codes on the BSC and the AWGNC. The two LDPC 
codes we evaluated are a rate-0.3 (640,192) quasi-cyclic (QC) LDPC code [17] and the rate-0.5 
(2640,1320) Margulis LDPC code [4]. The frame error rate (FER) curves are based on Monte 
Carlo simulations that generated at least 200 error frames for each point in the plots, and the 
maximum number of decoding iterations was set to 200. 

The (640,192) QC-LDPC code, designed by Han and Ryan ifTTll . is a variable-regular code 
with variable degree 5 and check degrees ranging from 5 to 9. It has 64 isomorphic (5,5) 
trapping sets and 64 isomorphic (5,7) trapping sets. We applied our exhaustive trapping set 
search algorithm [1331 to this code, and these are the only two types of (a, b) trapping set for 
a < 15 and b < 7. The error floor starts relatively high for MP decoders with limited message 
range, so it is quite easy to reach the error floor with Monte Carlo simulation. 

Fig. [5] shows the probability density function (pdf) of the magnitude of messages passed within 
the iterative message-passing decoders. Fig. |5(a)| shows the pdf for the min-sum decoder applied 
to the (640,192) QC-LDPC code on the BSC with p = 0.03, where the magnitude of all input 
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Magnitude of messages Magnitude of messages 

(a) Min-sum decoder on the (640,192) QC-LDPC code (b) SPA decoder on the Margulis code of length 2640 over 
over BSC of p = 0.03 and \LLR\ = 1. AWGNC of E b /N = 2.25 dB. 

Fig. 5. Probability density function of magnitude of messages. 

LLRs is scaled to 1. Fig. |5(b)| shows the pdf of the SPA decoder applied to the Margulis code 
of length 2640 on the AWGNC with E b /N = 2.25 dB. The data in each figure were obtained 
by using the corresponding floating-point MP decoders on sequences of received symbols from 
more than 10 million channel realizations, and gathering all the messages passed on all edges 
during all decoding iterations to generate the pdf. In the simulation, the iterative MP decoder 
stops when a codeword is found or when it reaches the maximum number of iterations, namely 
200. We can see from the figures that there is a considerable number of messages with large 
magnitude, which can be either favorable or unfavorable, and by further examining the simulation 
data, we found that such strong messages, in general, help to successfully decode the received 
symbols, as suggested by the idealized theoretical analysis described in Section Hill 

Figs. [6]-[8] show simulation results for various types of quantized min-sum decoders and 
floating-point MS decoders. For the BSC, we scaled the magnitudes of decoder input messages 
from the channel to 1, since, for linear decoders such as Gallager-B and min-sum, the scaling 
of channel input messages does not affect the decoding performance. For attenuated and offset 
min-sum decoding, we can compensate for the scaling by adjusting the attenuation and the offset 
factor, respectively. The uniform quantization step size A is set to 1 or 0.5. So, for example, 
when A = 1, the 3-bit uniform quantizer produces values {±3, ±2, ±1, 0}, and the (3+l)-bit 
quasi-uniform quantizer with d = 3 yields the values described in Table HI In the simulation, the 
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Fig. 6. FER results of min-sum (MS) decoder on the (640,192) QC-LDPC code on BSC. Uniform quantization step A = 1 
or 0.5, and d = 3 in (g+l)-bit quasi-uniform quantization. 



10' 




4 4.2 4.4 4.6 4.8 5 5.2 5.4 5.6 5.8 6 



W dB ) 

Fig. 7. FER results of min-sum (MS) decoder on the (640,192) QC-LDPC code on AWGNC. The offset factor /3 = 0.5, 
uniform quantization step A = 0.5, and d — 3 in (q+l)-bit quasi-uniform quantization. 
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Fig. 8. FER results of offset min-sum (OMS) decoder on the (640,192) QC-LDPC code on AWGNC. The offset factor /3 = 0.5, 
uniform quantization step A = 0.5, and d = 2 in (g+l)-bit quasi-uniform quantization. 



parameter d was heuristically chosen by testing different values, and when q is large, a small d 
would be enough to represent a large range of magnitudes. 

In Fig. [61 we see that the slopes of the error floors resulting from uniform quantization are close 
to that of the Gallager-B decoder error floor. This is because, when most messages saturate at 
the same magnitude, min-sum decoding essentially degenerates to Gallager-B decoding, relying 
solely upon the signs of messages. Comparing uniform quantizers with the same number of bits 
but different step sizes, we notice that smaller step size produces better waterfall performance but 
higher error floor. This observation can be explained by the saturation level of these quantizers. 
For example, 3-bit and 4-bit uniform quantization with step size A = 1 saturates at 3 and 7, and 
with A = 0.5 saturates at 1.5 and 3.5, respectively. The stronger messages, i.e., messages with 
larger magnitudes, can be helpful or harmful to the decoding process, depending on whether 
they are correct or not. The correct strong messages can help overcome the incorrectly received 
bits, but the strong incorrect messages would negatively influence the correctly received bits. In 
the error-floor region, when the channel condition is good, very few bits are received incorrectly, 
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and as we showed in Theorems \T\ and [51 large saturation levels allow correct messages to build 
up and overcome the incorrect messages in trapping sets. Therefore, in Fig. |6] the error floors 
produced by the different uniform quantizers are strictly in the order of their corresponding 
saturation levels. 

However, in the waterfall region where many bits are received incorrectly, reducing the 
saturation level limits the propagation of strong incorrect messages. Moreover, in this specific 
case, another reason for the better performance of the quantization with step size A = 0.5 is 
that, since the magnitudes of input LLRs to the min-sum decoder from the BSC are scaled to 1, 
the appearance of nonintegral saturated messages, together with a small saturation level, reduces 
the possibility of the summation of messages at a VN being zero, which could cause oscillatory 
behavior in the decoder. We can see from Fig. |6]that MS decoding with (3+l)-bit quasi-uniform 
quantization and A = 0.5 performs even better than the floating-point MS decoder. This is 
because its inherent uniform quantization with small saturation level improves the waterfall 
performance, as just explained, and the incorrect messages are not able to grow large enough 
to reach the next non-uniform quantization level. In other words, the proposed quasi-uniform 
quantization with carefully chosen parameters allows the correct messages to grow exponentially 
and limits the growth of incorrect messages. Similar results can also be found in Fig. [7J where 
(3+l)-bit quasi-uniform quantization also outperforms the floating-point MS decoder. However, 
we want to point out that such gains are highly code-dependent, and we conjecture that codes 
of higher variable degree would benefit more from the quasi-uniform quantization. 

In Figs. [9]- Fig.QTl we show the simulation results of the quasi-uniform quantization method 
applied to the SPA decoder. In the simulation of quantized SPA decoding, the input LLRs and the 
messages passed between CNs and VNs are quantized values, but all the CN updates are done 
with floating-point computation of the box-plus update rule in (fl6l) . Therefore, the simulation 
results for SPA decoding with quantized messages shown here can be considered the best error- 
rate performance possible with any fixed-point implementation of quantized SPA decoding. 

Figs. |9] and [10] show performance results for the (640,192) QC-LDPC code on the BSC and 
AWGNC, where the non-uniform quantization parameter of the quasi-uniform quantizer was set 
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Fig. 9. FER results of SPA decoder on the (640,192) QC-LDPC code on BSC. Uniform quantization step A 
d = 1.3 in (g+l)-bit quasi-uniform quantization. 
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Fig. 10. FER results of SPA decoder on the (640,192) QC-LDPC code on AWGNC. The uniform quantization step A = 0.25, 
and d = 1.5 in (q+l)-bit quasi-uniform quantization. 
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Fig. 11. FER results of approximate-SPA decoder on the Margulis code of length 2640 on AWGNC. Uniform quantization 
step A = 0.25, and d = 1.3 in (<?+l)-bit quasi-uniform quantization. 



to d — 1.5 because, with q > 5, a large range of message values is covered, even with such a 
small d. We see that, on both channels, the SPA decoder with quasi-uniform quantization yields 
FER results very close to those obtained with the float-point SPA implementation. Decoding 
with uniform quantization, on the other hand, suffers from high error floors. 

In Fig. |9l the magnitude of input LLRs from the BSC to quantized SPA decoders was scaled 
to 2; as we pointed out in earlier sections, such scaling could have an impact on the error-rate 
performance. The figure compares two floating-point SPA decoders that use different scalings of 
the input LLR magnitude. One uses the exact LLR value whose magnitude is | log — -\, where 
p is the channel error probability; the other scales the magnitude of all input LLRs to 2. We 
can see from the figure that the floating-point SPA decoder with scaled input LLRs from the 
BSC has slightly better performance, especially when the channel quality is good, i.e., where p 
is small. We note that the choice of the input LLR magnitude is heuristic and depends on the 
underlying LDPC code. Scaling the magnitude to 2 in this case may not be optimal, but we 
indeed found that scaling the magnitude to 1 would greatly degrade the error-rate performance. 
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In Fig. [HI we show results for SPA decoding of the length-2640 Margulis code. For this 
example, we adopted the following two-piece linear approximation from [|3D in the computation 
of the s(x,y) function in (fl9l) for box-plus SPA decoding, 



The approximated box-plus SPA decoder ran about five times faster than the floating-point SPA 
decoder, with a performance penalty of less than 0.02 dB in the waterfall region. In Fig. [TT] 
we also include the dual quantization SPA decoding proposed by Zhang et al. ETTl . where the 
(j)(x) function is quantized into a mapping table, denoted as <j)(x). Following the notation in 
ET! . we use dual quantization with parameters Q4.2/1.5, Q5.2/1.6, and Q6.2/1.7 for 6-bit, 7-bit, 
and 8-bit quantizers, respectively. The Qm.f quantizer uses uniform quantization to represent 
a signed fixed-point number with m bits to the left of the radix point for the integer part and 
/ bits to the right of the radix point for the fractional part. For example, a Q4.2 quantizer has 
uniform quantization step size of 0.25 and a range [—7.75,7.75]. Hence, all the quantization 
methods compared here have the same uniform step size of 0.25 when quantizing the channel 
input LLRs. From the plot of the <f){x) function in Fig. [31 we can see that the saturation level 
0(0) is limited by the quantization step size, because it is desirable to have 0(0) < x for all x 
satisfying 0(d) = 0. In other words, in the dual quantization scheme, the saturation level has to 
match the resolution of the quantizer; otherwise the error-rate performance in both the waterfall 
region and the error-floor region will be significantly degraded. Based on error-rate simulations 
using a range of saturation levels for dual quantization methods, we chose the saturation level for 
0(d) = to be 5.5, 7, and 8 for the 6-bit, 7-bit, and 8-bit quantizers, respectively. Moreover, due 
to the use of a mapping table with limited precision for the quantized 0(d) function, the error- 
rate performance in the error- floor region of SPA decoding with any dual quantization scheme 
is strictly worse than that of box-plus SPA decoding with uniform quantization, assuming the 
same number of bits and the same quantization step size are used in the representation of the 
channel LLR. The gap is greater when decoders are allowed to use more quantization bits, as 




0.6-0.241x1, if Id < 2.5 



otherwise. 



(22) 
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clearly shown in the simulation results. 

In Figs. [10] and [TTJ the proposed (5+l)-bit quasi-uniform quantization again has the best 
error-floor performance among all quantized decoders, and ever better than the double-precision 
floating-point box-plus SPA decoder. We observed from the simulation data that the floating-point 
SPA generally requires more iterations to decode a codeword than the quasi-uniform quantized 
SPA, especially in the high SNR region. Since the maximum number of iterations is set to 200, 
the quasi-uniform quantized SPA is able to outperform its floating-point counterpart. The fast 
convergence of the quasi-uniform quantized SPA derives from its non-uniform step size. From 
the idealized theoretical analysis, we know that the exponential growth rate of correct messages 
is larger than that of incorrect messages. In quasi-uniform quantization with proper parameters, 
the correct messages can reach the higher magnitude level earlier than the incorrect messages, 
and the incorrect messages are more likely to be quantized to lower magnitude levels. Hence, 
the correct messages overcome incorrect messages faster, and the decoder can converge to a 
codeword after fewer iterations. 

In all of the decoding failures observed when using the quasi-uniform quantizer with MS 
decoding and SPA decoding, no error pattern corresponded to the support of a small trapping set. 
With uniform quantization, on the other hand, almost all of the decoding failures corresponded 
to small trapping sets when the channel error probability of the BSC was small or the SNR of the 
AWGNC was high. We also compared decoder performance on sequences in which every VN in 
a single trapping set of type (5,5) or (5,7) of the (640,192) code was incorrect, with all other VNs 
set to correct values. In all cases, the large-range message-passing decoder and the message- 
passing decoder with the proposed quasi-uniform quantization method decoded successfully, 
while decoders with the uniform quantizer failed. The same results were also obtained for the 
(12,4) and (14,4) trapping sets in the Margulis code. The analytical and numerical results in this 
paper are only for variable-regular LDPC codes, further work is need to extend this approach 
to variable-irregular codes. 
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VI. Conclusion 

Trapping sets and other error-prone substructures are known to influence the error-rate perfor- 
mance of LDPC codes with iterative message-passing decoding. In this paper, we have shown that 
the use of uniform quantization in iterative MP decoding can be a significant factor contributing 
to the error floor phenomenon in LDPC code performance. An analysis of iterative MP decoding 
in an idealized setting suggests that decoder message saturation plays a key role in the occurrence 
of errors in small trapping sets, leading to observed error floor behaviors. To address this problem, 
we proposed a novel quasi-uniform quantization method that effectively extends the dynamic 
range of the quantizer. Without modifying the CN and VN update rules or adding extra stages to 
standard iterative decoding algorithms, the use of this quantizer was shown to significantly lower 
the error floors of two well- studied LDPC codes when used with various iterative MP decoding 
algorithms on the BSC and AWGNC. Simulation results for a (640,192) QC-LDPC code and 
the (2640,1320) Margulis code confirmed that this new quantization method can significantly 
reduce the error floors of these codes with essentially no increase in decoding complexity. 
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